Skip to content

Feat: Adding a CNI Telemetry sidecar to CNS pod to replace the azure-vnet-telemetry binary #3824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

behzad-mir
Copy link
Contributor

@behzad-mir behzad-mir commented Jul 14, 2025

Adding a CNI Telemetry sidecar to CNS pod to replace the azure-vnet-telemetry binary.

Requirements:

Notes:

@Copilot Copilot AI review requested due to automatic review settings July 14, 2025 16:17
@behzad-mir behzad-mir requested a review from a team as a code owner July 14, 2025 16:17
@behzad-mir behzad-mir requested a review from csfmomo July 14, 2025 16:17
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the CNI telemetry service initialization issue where it was failing to start due to an empty Application Insights instrumentation key. The fix ensures that the telemetry service uses the proper AI configuration from the CNS configuration instead of an empty basic configuration.

  • Updates the startTelemetryService function to accept CNS configuration and properly initialize AI telemetry settings
  • Adds validation to check for Application Insights instrumentation key before starting the telemetry service
  • Introduces proper telemetry configuration mapping from CNS settings to AI config

@@ -94,6 +94,7 @@ const (
// Service name.
name = "azure-cns"
pluginName = "azure-vnet"
aiPluginName = "AzureCNI"
Copy link
Preview

Copilot AI Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in the PR title 'Telemtry' which should be 'Telemetry'. While this constant name is correct, ensure consistency in naming throughout the codebase.

Copilot uses AI. Check for mistakes.

@behzad-mir behzad-mir added do-not-merge cni Related to CNI. labels Jul 14, 2025
@tamilmani1989 tamilmani1989 requested review from QxBytes and rbtr July 14, 2025 16:58
Comment on lines 497 to 500
if cnsconfig.TelemetrySettings.AppInsightsInstrumentationKey != "" {
err = tb.CreateAITelemetryHandle(aiConfig, ts.DisableTrace, ts.DisableMetric, ts.DisableEvent)
} else {
logger.Printf("No Application Insights key provided for CNI telemetry service")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check other way and remove else?

if cnsconfig.TelemetrySettings.AppInsightsInstrumentationKey == "" {
  logger.Printf("No Application Insights key provided for CNI telemetry service")
  return
}
err = tb.CreateAITelemetryHandle(aiConfig, ts.DisableTrace, ts.DisableMetric, ts.DisableEvent)

@santhoshmprabhu
Copy link
Contributor

santhoshmprabhu commented Jul 14, 2025

Discussed in the team meeting today - we may want to consider doing telemetry for CNI directly in CNI rather than CNS, to account for scenarios where CNI can't talk to CNS.
Tagging @tamilmani1989 as decided in the call.

@behzad-mir behzad-mir force-pushed the telemetry-statelessCNIFix branch from 2389f47 to df332ec Compare July 22, 2025 05:10
@behzad-mir behzad-mir requested a review from a team as a code owner July 22, 2025 05:10
@behzad-mir behzad-mir force-pushed the telemetry-statelessCNIFix branch 5 times, most recently from dde3441 to b59b87b Compare July 22, 2025 06:37
@behzad-mir behzad-mir changed the title fix: Fixing CNI Telemtry Service run by CNS Feat: Adding a CNI Telemetry sidecar to CNS pod to replace the azure-vnet-telemetry binary Jul 22, 2025
Copy link
Contributor

@QxBytes QxBytes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on my understanding, this pr replaces azure-vnet-telemetry binary on the node with a sidecar that does the same thing. So CNI would send AI events to this sidecar via a socket if it is present (rather than the azure vnet telemetry long running process). I believe right now azure-vnet-telemetry is still on the vhd (?) so CNI will still attempt to start that azure vnet telemetry binary service. Can you check that there's no race or issue with both CNS and Azure CNI trying to start the telemetry service at the same time?

CNI_TELEMETRY_SIDECAR_DIR = $(REPO_ROOT)/cns/cni-telemetry-sidecar
CNI_TELEMETRY_SIDECAR_BUILD_DIR = $(BUILD_DIR)/cni-telemetry-sidecar
CNI_TELEMETRY_SIDECAR_AI_ID = $(CNI_AI_ID) # Reuse CNI AI ID
CNI_TELEMETRY_SIDECAR_VERSION = $(CNS_VERSION) # Version follows CNS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move the version to where all the other version variables are?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if you're intending this to be like azure ip masq merger with its own release versions etc. you'll need to add the exclusion to the tag above: ACN_VERSION ?= $(shell git describe --exclude "azure-ip-masq-merger*" --exclude "azure-ipam*" --exclude "dropgz*" --exclude "zapai*" --exclude "ipv6-hp-bpf*" --tags --always)

@@ -123,7 +128,7 @@ all-binaries-platforms: ## Make all platform binaries

# OS specific binaries/images
ifeq ($(GOOS),linux)
all-binaries: acncli azure-cni-plugin azure-cns azure-npm azure-ipam azure-ip-masq-merger ipv6-hp-bpf
all-binaries: acncli azure-cni-plugin azure-cns azure-npm azure-ipam azure-ip-masq-merger ipv6-hp-bpf cni-telemetry-sidecar
all-images: npm-image cns-image cni-manager-image azure-ip-masq-merger-image ipv6-hp-bpf-image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like you're making a new image-- if this is like a new sidecar image/container image, there are other dockerfiles/scripts you may need to edit (ex: for the signed pipeline). would the image target be added here?

@@ -275,6 +284,7 @@ CNI_IMAGE = azure-cni
CNS_IMAGE = azure-cns
NPM_IMAGE = azure-npm
AZURE_IP_MASQ_MERGER_IMAGE = azure-ip-masq-merger
CNI_TELEMETRY_SIDECAR_IMAGE = azure-cni-telemetry-sidecar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if you want to keep the variable naming consistent? it's CNI_TELEMETRY here but the image name is azure-cni-telemetry-sidecar-- is the "sidecar" part of the name necessary (here and elsewhere)? If it's a container image I feel like it's already implied that it'll be running in a sidecar or similar.

# Create a CNI Telemetry Sidecar archive for the target platform.
.PHONY: cni-telemetry-sidecar-archive
cni-telemetry-sidecar-archive: cni-telemetry-sidecar-binary
ifeq ($(GOOS),linux)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this binary is on windows and linux would we omit this linux check?


// Validate batch interval for optimal Azure ingestion
if config.TelemetrySettings.TelemetryBatchIntervalInSecs <= 0 {
logger.Printf("Warning: Invalid telemetry batch interval, using default")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does ai under the hood use the default automatically if it's not valid, or should this also modify the value to the default (also applies to above conditional)?

flag.Parse()

// Initialize logging for the CNI telemetry sidecar
logger.InitLogger("azure-cns-cni-telemetry-sidecar", 1, 1, "/var/log/azure-cns-telemetry")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the naming be the same-- would it be cns or cni telemetry? Like below the logger prints that it's azure cni telemetry sidecar but the log is to the file azure-cns-telemetry

const (
// CNI telemetry constants aligned with azure-vnet-telemetry
cniTelemetryAppName = "azure-vnet-telemetry"
cniTelemetryVersion = "1.0.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we want to start with 0.0.x release train or 1.0.x?

@behzad-mir behzad-mir force-pushed the telemetry-statelessCNIFix branch from b59b87b to fff55be Compare July 22, 2025 17:28
@behzad-mir behzad-mir force-pushed the telemetry-statelessCNIFix branch from fff55be to 6e4257f Compare July 22, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cni Related to CNI. do-not-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants